Off-Topic Detection In Conversational Telephone Speech

نویسندگان

  • Robin Stewart
  • Andrea Danyluk
  • Yang Liu
چکیده

In a context where information retrieval is extended to spoken “documents” including conversations, it will be important to provide users with the ability to seek informational content, rather than socially motivated small talk that appears in many conversational sources. In this paper we present a preliminary study aimed at automatically identifying “irrelevance” in the domain of telephone conversations. We apply a standard machine learning algorithm to build a classifier that detects offtopic sections with better-than-chance accuracy and that begins to provide insight into the relative importance of features for identifying utterances as on topic or not.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Boosting Approach to Topic Spotting on Subdialogues

We report the results of a study on topic spotting in conversational speech. Using a machine learning approach, we build classifiers that accept an audio file of conversational human speech as input, and output an estimate of the topic being discussed. Our methodology makes use of a wellknown corpus of transcribed and topic-labeled speech (the Switchboard corpus), and involves an interesting do...

متن کامل

Confidence-Based Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech

We investigate the impact of automatic speech recognition errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF featureweighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice outputs using one reco...

متن کامل

Techniques for rapid and robust topic identification of conversational telephone speech

In this paper, we investigate the impact of automatic speech recognition (ASR) errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF feature weighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice ...

متن کامل

HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus

The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either stran...

متن کامل

Topic Identification from Audio Recordings Using Rich Recognition Results and Neural Network Based Classifiers

This paper investigates the use of a Neural Network classifier for topic identification from conversational telephone speech, which exploits rich recognition results coming from an automatic speech recognizer. The baseline features used to feed the neural classifier are produced using the words extracted from the 1-best sequence. Rich recognition results include the word union of the first n-be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006